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Abstract 

This  paper  compares  three  image  stabilization  algorithms  when  used  as  preprocessors  for  a  target  track¬ 
ing  application.  These  algorithms  vary  in  computational  complexity,  accuracy,  and  ability.  Algorithm  1  is 
capable  of  only  pixel-level  realignment  of  imagery,  while  Algorithms  2  and  3  are  capable  of  full  subpixel  sta¬ 
bilization  with  respect  to  translation,  rotation,  and  scale.  The  algorithms  are  evaluated  on  their  performance 
in  the  stabilization  of  one  synthetic  forward  looking  infrared  (FLIR)  data  set  and  two  real  FLIR  imagery 
data  sets.  The  evaluation  tools  incorporated  include  mean  square  error  of  the  output  data  set  and  the  over¬ 
all  performance  of  an  automatic  target  acquisition  system  (developed  at  the  Army  Research  Laboratory) 
that  uses  the  algorithms  as  a  front  end  preprocessor.  We  find  that  for  this  tracking  application,  extremely 
accurate  subpixel  stabilization  is  a  requirement  for  proper  operation.  We  also  find  that  in  this  application. 
Algorithm  3  performs  significantly  better  than  the  other  two  algorithms. 


*The  partial  support  of  the  Advanced  Research  Projects  Agency  (ARPA  Order  No.  A422)  and  the  Army  Research  Office 
under  Grant  DAAH04-93-G-0419  is  gratefully  acknowledged,  as  is  the  help  of  Sandy  German  in  preparing  this  paper. 
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1.  Introduction 

What  is  image  stabilization?  According  to  Webster’s  New  World  Dictionary  [1],  the  word  “stable”  means 
“capable  of  returning  to  equilibrium  or  original  position  after  having  been  displaced;  steady;  fixed.”  Thus  stabiliza¬ 
tion  may  be  defined  as  the  act  of  making  something  return  to  equilibrium  or  its  original  position  after  being  dis¬ 
placed.  When  applied  to  imagery,  stabilization  refers  to  modifying  an  image  sequence  that  comes  from  a  moving  or 
jittering  camera  so  that  the  image  appears  stable  or  stationary. 

Image  stabilization  technology  has  already  made  its  way  into  the  commercial  marketplace.  It  may  be  found  in 
state-of-the-art  research  systems  such  as  the  “VFE”  fi-om  David  Samoff  Research  Center  and  in  consumer  products 
such  as  home  video  cameras.  The  VFE  has  been  used  to  provide  stabilized  video  fit)m  off-road  robotic  vehicles, 
while  home  cameras  are  capable  of  removing  high-fi’equency  jitter  which  may  be  caused  by  an  imsteady  hand  at  tire 
camera.  Stabilization  techniques  may  be  broken  down  into  two  broad  application  areas,  which  involve  human- 
observed  imagery  and  computer-observed  imagery,  respectively. 

Human-observed  imagery  refers  to  imagery  that  is  intended  for  viewing  by  a  human  observer.  This  includes  the 
home  video  recorder,  as  well  as  imagery  that  is  sent  back  to  a  robotic  control  station  (RCS).  RCS  imagery  may  be 
viewed  to  permit  robotic  tele-operation/tele-driving  or  human-assisted  target  acquisition/recognition.  In  these  cases, 
imagery  from  a  moving  off-road  vehicle  may  be  too  jittery  for  a  human  to  easily  or  comfortably  observe.  Image  sta¬ 
bilization  may  be  used  to  remove  this  motion,  and  create  a  more  viewable  image  for  the  observer.  Hmnan-observed 
imagery  may  also  come  from  stationary  cameras  that  are  parmed  in  order  to  search  a  large  region.  Precise  image 
stabilization  would  provide  the  information  necessary  to  allow  an  “image  mosaic”  of  the  entire  scene  to  be  created. 
In  an  im^e  mosaic,  the  camera’s  entire  parmed  region  is  displayed  on  the  monitor.  As  new  updated  imagery  is  re¬ 
ceived  from  the  camera,  it  is  pasted  into  the  appropriate  region  of  the  mosaic.  This  allows  for  continuous  viewing  of 
background  contextual  information  aroimd  the  actual  live  camera  image. 

Computer-observed  imagery  includes  the  broad  class  of  computer  vision  algorithms  that  may  be  applied  to  an 
incoming  video  stream.  This  includes  such  widely  varied  applications  as  automatic  target  acquisition  (ATA),  auto¬ 
matic  target  recognition,  autonomous  mobility,  and  obstacle  avoidance.  Since  a  rigid  computer  algorithm  is  now 
“looking”  at  the  incoming  imagery,  we  no  longer  have  the  benefit  of  the  human  visual  system  as  a  post-processor.  In 
fact,  it  will  be  shown  that  even  subpixel  misalignment  between  image  frames  is  too  much  for  certain  ATA  algo¬ 
rithms  to  handle. 

This  paper  concentrates  on  the  performance  characterization  of  several  stabilization  algorithms  with  respect  to 
how  well  they  work  in  conjunction  with  a  post-processing  algorithm  such  as  tracking.  The  Army  Research  Labora¬ 
tory  (ARL)  has  been  working  in  the  area  of  military  robotics  for  many  years.  In  cooperation  with  the  Advanced 
Projects  Research  Agency  (ARP A),  ARL  has  fielded  several  computer  vision  systems  for  use  in  military  field  ex¬ 
periments.  These  platforms  provide  a  source  of  known,  robust  algorithms  in  areas  such  as  ATA  The  ATA  system 
which  ARL  has  fielded  is  designed  to  work  with  imagery  taken  by  a  stable  fixed  camera.  For  this  reason,  it  was  felt 
that  the  ARL  tracker  performance  would  provide  a  good  measure  of  the  quality  of  a  stabilized  image  sequence. 
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The  ARL  tracker  output  displays  a  box  around  any  object  which  the  tracker  has  determined  to  be  a  target  vehi¬ 
cle.  This  target  box  has  an  associated  target  identifier  with  it.  Under  normal  operating  conditions  (i.e.,  a  stable  input 
video  sequence),  this  target  identifier  remains  constant  throughout  the  tracking  sequence,  and  the  surrounding  box  is 
slightly  larger  than  the  actual  target  (the  percentage  larger  is  a  user-controlled  parameter  that  guarantees  complete 
target  segmentation).  In  order  for  the  tracker  to  perform  the  above  operations,  it  must  accurately  segment  the  target 
fi-om  the  background  as  well  as  be  able  to  determine  the  target’s  motion  characteristics,  feiage  instabilities  that  are 
not  compensated  for  will  adversely  affect  the  tracker’s  ability  to  perform  these  operations.  Therefore,  these  meas¬ 
ures,  along  with  the  time  to  acquire  a  target  and  the  false  alarm  rate,  will  be  used  to  judge  the  various  stabilization 
algorithms. 

Three  different  stabilization  algorithms  are  addressed  in  this  paper.  The  first  algorithm  (Projection  Algorithm) 
was  developed  by  ARL  to  compensate  for  wind  loading  on  one  of  their  unmanned  robotic  platforms.  This  algorithm 
is  the  least  computationally  expensive,  and  is  only  capable  of  compensating  for  integer  image  translations.  The  sec¬ 
ond  algorithm  (Feature  Tracking  Algorithm  (FTA)  1)  is  a  feature  tracking  algorithm  developed  by  the  University  of 
Maryland.  This  algorithm  is  capable  of  full  stabilization  in  translation,  rotati<m,  and  scale,  and  provides  for  subpfacel 
accuracy.  This  algorithm  stabilizes  fimne  n  to  flame  «-l  and  therefore  has  a  very  short  memory.  The  final  algorithm 
(Feature  Tracking  Algorithm  2),  which  is  the  most  computationally  expensive,  was  also  developed  at  the  University 
of  Maryland.  This  algorithm  is  an  extension  of  FTA  1  and  requires  a  longer  image  memory. 

The  organization  of  this  paper  is  as  follows:  Section  2,  Algorithm  Overviews,  presents  an  overview  of  the  three 
stabilization  algorithms  studied,  as  well  as  an  overview  of  the  tracking  system.  Section  3,  Algorithm  Characteriza¬ 
tion,  provides  an  overview  of  the  tracking  system  performance  criteria  as  well  as  a  definition  of  the  performance 
baseline  and  evaluation  data  sets.  Sections  3.3  to  3.5  discuss  the  results  of  the  actual  evaluations.  Finally,  Section  4 
presents  conclusions  and  discusses  areas  for  future  work. 

2.  Algorithm  Overviews 

2.1  Projection  Algorithm 

ARL  implemented  a  real-time  image  stabilization  algorithm  for  die  Demo  1  robotics  initiative  [2].  The  Demo  1 
initiative  consisted  of  five  robotic  high-mobility  multi-purpose  wheeled  vehicles  (HMMWVs)  performing  a  variety 
of  robotic  missions.  These  included  tele-operated  driving,  autonomous  road  following,  autonomous  retrotraverse  (an 
automatic  vehicle  path  retrace  operation),  automatic  target  acquisition  (ATA),  and  automatic  weapon  control.  The 
weapon  control  consisted  of  the  ATA  system  “handing  off’  target  locations  and  velocity  vectors  to  a  weapon  control 
system  which  directed  point  fire  (a  25mm  cannon)  and  a  laser  designator  (the  moving  target  was  illuminated  with  a 
laser  beam  for  5  seconds)  at  the  selected  targets. 

The  ARL  algorithm  was  specifically  designed  to  eliminate  motion  caused  by  wind  loading  on  a  stationary  sen¬ 
sor  platform.  The  current  algorithm  is  capable  of  compensating  for  translational  motion,  and  simple  additions  have 
been  proposed  that  would  allow  compensation  for  rotation.  Since  the  algorithm  was  designed  for  stationary  plat¬ 
forms,  no  attempt  was  made  to  compensate  for  image  scale  change.  The  algorithm  is  based  on  the  standard  firame- 


2 


by-frame  cross  correlation  method  [3]  with  modifications  to  allow  for  real-time  implementation.  The  algorithm  can 
be  broken  down  into  three  major  parts:  image  mapping,  projection  filtering,  and  alignment. 


2.1.1  Image  Mapping 


Each  incoming  image  frame  is  mapped  from  its  original  two-dimensional  image  space  into  two  distinctive  one¬ 
dimensional  waveforms  as  shown  in  Figure  1.  TTiis  is  accomplished  by  using  equation  (2.1)  to  compute  modified 
column  projections.  Modified  row  projections  are  computed  in  a  similar  manner. 


i 

ColTotk  = 

kJ 

Col  Pr  oJk  (y) = Coljc  (y)  -  ColTotk 


(2.1) 

Here  Colt{f)  represents  the  unmodified  projection  of  the  A:*  image,  /  column;  CurkQJ)  is  the  (zj/ pixel  of  the  k* 
input  image;  NC  is  the  number  of  columns;  and  ColProjkij)  is  the  modified  projection  of  the  A:*  image,y*  column. 


Pre-filtered  Projection 


Figure  1 :  Projections  of  columns  from  frame  2,  real  data  set  1. 

In  the  standard  cross-coirelation  technique,  projections  from  consecutive  frames  would  now  be  cross-correlated. 
It  was  found,  however,  that  alignment  accuracy  could  be  greatly  improved  by  filtering  the  projections. 

2.1.2  Projection  Filtering 

For  large  image  shifts  (>10%),  edge  information  is  unique  for  each  frame.  It  was  foimd  that  this  unique  edge  in¬ 
formation  adversely  affected  the  cross-correlation  peak.  To  compensate  for  this  problem,  possible  projection  peaks 
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around  the  edges  are  eliminated.  This  is  accomplished  by  passing  each  projection  through  the  raised  cosine  filter 
given  in  equation  (2.2).  As  shown  in  Figure  2,  diis  filter  has  the  effect  of  lowering  the  amplitude  of  the  edge  infor¬ 
mation  while  leaving  the  center  regions  intact. 


Col?TOjllj)  = 


ColfTOJlljjX 


l+co^nx(F-l-;)/F] 


je(0,F),(NC-F,NC) 

else 


(2.2) 

Here  F  is  the  user-selectable  filter  length,  and  NC  is  the  number  of  columns. 


Figure  2:  Filtered  projections  of  columns,  fiume  2,  real  data  set  1. 

2.1.3  Alignment 

A  cross-correlation  is  perfonned  between  the  waveforms  of  fi^e  k  and  the  reference  image.  As  shown  in 
Figure  3  this  results  in  a  unique  peak  location.  The  resulting  peak  is  used  to  shift  the  new  image  into  alignment  Pre¬ 
cision  may  be  slightly  improved  by  performing  a  regional  two-dimensional  cross-correlation  around  the  intersection 
of  the  highest  row  and  column  peak.  A  rotational  correction  step  was  designed,  but  not  implemented,  for  this  algp- 
rithm.  This  step  involved  correlating  the  region  around  the  highest  row/column  peak  with  warped  (in  rotation)  image 
chips  from  the  reference  image.  The  highest  cross-correlation  value  would  dictate  the  amount  of  rotation. 
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Column  Correlation 


Figure  3:  Cross-correlation  of  frame  1  with  frame  2. 


2.2  FTA1 

The  first  feature  tracking  algorithm  was  one  developed  by  Qinfen  Zheng  [4]  and  modified  for  real-time  imple¬ 
mentation  on  Datacube  hardware  by  Carlos  Morimoto  et  al.  [5]  at  the  University  of  Maryland.  This  algorithm  is  a 
feature-based  algorithm  which  operates  on  a  hierarchical  image  set  through  the  use  of  a  Gaussian  pyramid.  The  cur¬ 
rent  real-time  implementation  accepts  NTSC  video  input  and  displays  its  stabilized  output  at  a  final  resolution  of 
128x128  pixels.  This  algorithm  is  capable  of  compensating  for  image  translation,  scale,  and  rotation.  The  magnitude 
of  motion  which  may  be  compensated  for  is  directly  related  to  the  coarsest  pyramid  resolution  and  correlation  search 
space  (see  Section  2.2.2  on  Feature  Detection).  In  order  to  characterize  this  algorithm,  the  Datacube  software  was 
ported  to  a  Sun  station  and  modified  to  have  a  final  output  resolution  of  512x512. 

The  steps  of  this  algorithm  are  as  follows: 

1 .  Create  Gaussian  pyramid  from  image. 

2.  Detect  and  select  a  small  set  of  features  (not  performed  for  every  image), 

3.  Match  area  surrounding  current  features  with  previous  frame. 

4.  Compute  estimate  of  image  motion  based  on  matched  feature  displacement. 

5 .  Transform  next  higher  level  of  pyramid  by  current  estimate  of  image  motion. 

6.  Repeat  from  step  (3)  for  all  levels  of  pyramid  until  reaching  final  output  resolution. 

2.2.1  Gaussian  Pyramid 

FTA  1  is  a  multi-resolution  approach  to  image  stabilization.  In  order  to  extend  the  range  of  the  motion  estima¬ 
tion  process,  a  multi-resolution  Gaussian  pyramid  structure  is  implemented.  Successive  levels  of  this  pyramid 
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structure  “shrink”  the  resolution  and  sample  density  of  the  original  image  by  powers  of  two.  This  has  the  effect  of 
increasing  the  frame-to-firame  motion  which  may  be  compensated  for.  For  example,  in  a  three-level  pyramid,  a  shift 
of  one  pixel  corresponds  to  a  four-pixel  shift  in  the  original  image.  By  successively  solving  the  fine-motion  problem 
at  each  level  of  the  pyramid,  and  applying  the  intermediate  results  to  warp  the  original  image,  one  can  bring  the 
original  image  into  fine  alignment.  If  we  define  the  sample  distance  as  the  largest  pixel  shift  which  may  be  compen¬ 
sated  for,  and  assume  that  the  image  motion  lies  in  this  range,  then  this  successive  alignment  procedure  ensures  that 
the  residual  displacement  between  levels  (amoimt  of  displacement  which  has  yet  to  be  determined)  is  less  then  the 
sample  distance  [6]. 

Formally,  if  we  let  „  be  the  «*  pyramid  level  for  the  image  l(x,y,t),  we  may  form  subsequent  levels  of  the 
pyramid  by  convolving  the  previous  level  with  a  kernel  filter  ©  and  sub-sampling  [7]: 


(2.3) 

where  indicates  that  the  quantity  in  brackets  has  been  sub-sampled  (both  rows  and  columns)  by  a  factor  of  two. 
This  technique  has  been  successfully  used  to  estimate  motion  at  accuracies  of  a  small  fiaction  of  a  pixel  [8, 9,  10]. 

2.2.2  Feature  Detection 

For  the  purpose  of  this  algorithm,  a  feature  is  defined  as  a  point  in  the  image  which  represents  a  large  contrast 
change.  Object  edges  offer  an  excellent  source  of  such  contrast  changes.  In  order  to  find  such  edges,  all  levels  of  the 
Gaussian  pyramid  are  convolved  with  a  Laplacian  mask  to  create  an  edge  image.  The  Laplacian  mask  acts  as  an 
edge  detector  by  highlighting  areas  of  sharp  contrast  change  while  suppressing  areas  of  uniform  contrast  [1 1].  The 
pyramid  level  which  represents  the  resolution  of  the  final  stabilized  image  is  next  broken  down  into  a  number  of 
equal-width  columns.  Each  of  these  columns  is  searched  for  the  pixel  highest  in  value,  and  this  pixel  is  marked  as  a 
feature  point 

This  method  of  feature  extraction  was  selected  because  of  its  ease  of  implementation  in  a  real-time  system.  It  is 
a  very  simplistic  approach  to  feature  extraction,  that  is  designed  to  locate  areas  of  the  image  which  contain  sharp, 
well-defined  edges.  The  actual  point  selected  as  a  feature  point  is  not  important,  and  may  change  from  frame  to 
flame.  What  is  important  is  that  the  area,  or  window,  surrounding  the  feature  point  contain  an  area  of  high  contrast. 
In  the  next  step  of  the  algorithm,  the  feature  point  will  be  translated  down  through  the  pyramid  into  the  coordinate 
space  of  the  finest  pyramid  level.  A  windowed,  weighted  correlation  and  subpixel  matching  algorithm  will  then  be 
performed  on  the  edge  image  in  order  to  determine  the  extent  of  the  image  motion.  It  is  this  windowed  correlation 
that  determines  the  maximum  image  movement  which  may  be  compensated  for.  It  is  necessary  for  the  true  correla¬ 
tion  peak  to  lie  within  the  window  of  the  coarsest  pyramid  level.  If  this  is  the  case,  subsequent  pyramid  levels  will 
then  be  used  to  refine  the  estimate.  This  means  that  increasing  either  the  number  of  pyramid  levels  or  the  size  of  the 
correlation  search  window  will  result  in  being  able  to  compensate  for  larger  image  motions. 
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2.2.3  Area  Matching 

The  area  matching  algorithm  operates  over  a  three- frame  interval.  Feature  areas  from  the  previous  or  next  frame 
are  matched  with  the  current  frame,  and  motion  estimates  are  obtained.  Since  this  bases  future  stabilization  estimates 
on  the  current  estimate,  stabilization  errors  may  additively  propagate  through  the  algorithm,  causing  a  noticeable 
drift  in  the  stabilized  image.  This  image  drift  is  caused  by  both  quantization  error  and  error  due  to  false  peak  selec¬ 
tion  in  the  correlation  output  [12].  To  compensate  for  the  possible  drift  incurred  through  the  use  of  the  cross¬ 
correlation  technique,  this  algorithm  uses  a  two-step,  subpixel-accuracy  matching  algorithm.  If  each  pixel  is  viewed 
as  a  grid  point  in  frame  n,  then  a  match  to  the  corresponding  grid  point  in  firame  n-l  is  obtained  by  using  a  weighted 
cross-correlation  match.  The  match  is  then  refined  by  using  a  differential  method  to  achieve  subpixel  accuracy  [12, 
13].  Finally,  these  matched  feature  points  are  used  to  calculate  the  actual  image  motion. 

123.1  Weighted  Correlation 

There  is  a  tradeoff  in  the  use  of  a  correlation  match.  A  larger  area  used  in  computing  the  correlation  causes  bet¬ 
ter  selectivity  over  similar  features,  but  less  accuracy  in  the  actual  location.  To  obtain  high  location  accuracy,  while 
rejecting  false  peaks  caused  by  similar  features,  a  weighted  correlation  scheme  is  used.  This  correlation  uses  a  large, 
symmetrically  weighted  window  /which  has  high  values  in  the  center  and  decreasing  values  as  the  edges  are  ap¬ 
proached.  The  large  area  yields  good  selectivity,  and  the  weighting  maintains  good  feature  localization.  The  modi¬ 
fied  matching  criterion  is  [12] 

^  j)  ^  •^)) 

f  (m,«;t/,v)=  _  2( - ^ - A’ 

^  +  /,  n  +  y)  X  ^  Xy-fj  (/«  +  /,«+ y) 


where 


(max{i,;})’ 


|/,yj  *  |0,0},  c  =  constant 


2232  Subpixel  Matching 

It  is  assumed  that  the  above  correlation  technique  provides  an  accurate  match  to  within  ±0.5  pixels  in  each  di¬ 
rection.  This  estimate  of  motion  is  further  refined  using  the  following  subpixel  matching  algorithm: 

Assume  that  frame  f2  is  offset  from  frame  fi  by  (&,  ;  then 


The  frame  difference  may  be  written  as  follows: 
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(2.6) 


where  the  derivatives  of  f|  may  be  approximated  by  forward  differences.  Examining  a  small  neighborhood  around 
the  feature  point  yields  a  set  of  simultaneous  equations  that  may  be  solved  to  find  the  values  of  Sx  and  Keeping 
diis  neighborhood  small  allows  for  reduced  computations  as  well  as  achieving  better  localization  [14]. 

2.2.4  Motion  Estimation 

From  the  set  of  matched  feature  pairs,  we  must  now  derive  information  that  will  allow  us  to  warp  the  current 
image  firame  into  alignment  with  the  previous  image  frame.  It  is  therefore  necessary  to  compute  the  scale  and  rota¬ 
tion  change,  and  the  horizontal  and  vertical  translation.  These  are  represented  by  s',  6',  AX,  and  AT  respectively. 
The  first  parameter  to  be  computed  is  the  scale.  Since  the  Euclidean  distance  between  two  feature  points  is  invariant 
to  changes  in  translation  and  rotation,  the  scale  may  be  computed  as 

N'  / 

14  / 


S'  =  /=l 


'N' 

/=1 


where  is  the  number  of  matched  feature  points,  and  are  the  distances  from  the  feature  point  to  the  center 
of  gravity  of  all  of  the  matched  feature  points  in  frames  t\  and  t2 ,  respectively. 

If  we  limit  our  frame-to-frame  rotation  to  be  small,  then  sin0  and  cos0  may  be  approximated  in  linear  terms. 
The  solution  for  the  translation  and  rotation  then  becomes  the  solution  of  a  set  of  linear  equations: 


Xi  1  0'  /xn  .  r  ^ 


where  and  are  the  coordinates  ofthe  matched  feature  points  in  frame  t2  and  the  transformed  frame 

ti,  respectively,  and  s', 6',  AX',  and  AF'  are  the  scale,  rotation,  and  translation  parameters  which  need  to  be  esti¬ 
mated.  Solving  equations  (2.7)  and  (2.8)  for  these  parameters  provides  all  the  information  necessary  to  transform  the 
current  image  frame  on  a  pixel-by-pixel  basis  through  an  affine  transform  to  match  the  location  of  the  previous 
flame. 
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2.3  FTA2 


FTA  1  was  designed  to  work  on  human-observed  imagery  from  a  mobile  off-road  vehicle.  As  such,  this  algo¬ 
rithm  strives  to  align  frame  n  to  frame  n+\.  This  results  in  the  system  having  a  very  short  memory.  For  the  case  of 
the  ARL  tracker,  a  very  long  system  memory  is  desirable  (i.e.,  aligning  frame  n  to  the  reference  frame).  In  order  to 
accommodate  this,  FTA  1  was  modified  to  perform  alignment  of  each  incoming  frame  with  the  reference  frame.  The 
steps  of  the  modified  algorithm  are  as  follows: 

1 .  Create  Gaussian  pyramid  from  image. 

2.  Detect  features  of  image  (not  performed  for  every  image). 

3.  Match  area  surrotmding  current  features  with  previous  flame. 

4.  Compute  estimate  of  image  motion  based  on  feature  motion. 

5 .  Transform  next  highest  level  of  pyramid  by  current  estimate  of  image  motion. 

6.  Repeat  from  step  (3)  for  all  levels  of  pyramid  until  reaching  final  output  resolution. 

7.  Match  area  surrounding  reference  frame  features  with  current  transformed  flame. 

8.  Compute  residual  motion  of  image  based  on  feature  motion 

9.  Transform  current  image  by  refined  motion  estimation. 

It  should  be  noted  that  the  only  change  to  the  algorithm  is  the  addition  of  steps  7  through  9.  These  steps  provide 
additional  refinement  and  correct  for  any  residual  drift  that  was  not  removed  by  the  subpixel  matching  phase. 

2.4  ARL  Tracking  System 

The  ARL  automatic  target  acquisition  (ATA)  system  was  developed  as  an  integrated  multi-sensor  (FLIR  and 
acoustic)  approach  to  tracking  multiple  moving  targets  from  a  stationary  platform.  This  algorithm,  developed  for  the 
Technology-based  Enhancements  for  Autonomous  Machines  (TEAM)  program,  is  currently  being  used  by  the  Ro¬ 
botics  Demo  II  program.  This  tracking  algorithm  attempts  to  overcome  many  of  the  classic  ATA  problems,  such  as 
tracking  in  cluttered  environments,  correctly  distinguishing  between  multiple  overlapping  targets,  and  dis¬ 
criminating  between  actual  moving  vehicles  and  “false”  moving  objects  (dust  clouds,  small  animals,  waving  tree 
branches)  [2].  Most  ATA  systems  follow  the  following  steps  in  processing  an  incoming  video  stream  [15]: 

1 .  Signal  preprocessing  is  performed  to  improve  target  contrast  and  reduce  sensor  noise. 

2.  Potential  targets  are  located. 

3 .  Potential  targets  are  segmented  from  the  background. 

4.  Features  of  each  potential  target  are  used  to  discriminate  real  targets  from  non-targets. 

The  ARL  approach  to  ATA  fits  into  the  above  paradigm. 
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2.4.1  Signal  Preprocessing 

In  order  for  the  ATA  system  to  recognize  the  presence  of  targets,  it  must  first  have  a  target-free  view  of  the 
world.  We  call  this  image  the  “reference  image”  because  it  represents  a  target-free  baseline  for  the  system.  This  im¬ 
age  is  acquired  at  system  setup  time  and  is  allowed  to  slowly  change  to  reflect  changing  conditions  in  the  tracking 
area  (i.e.,  changes  in  illumination).  Taigets,  however,  must  not  be  allowed  to  become  part  of  this  reference  image.  If 
they  were  present,  the  ATA  system  would  include  them  in  the  image  baseline  and  would  not  continue  to  recognize 
them  as  possible  targets.  By  slowly  updating  the  reference  image,  the  ATA  system  is  given  time  to  recognize  target 
areas  before  they  become  part  of  the  image  baseline.  The  reference  image  is  then  selectively  updated: 


R¥k+\(ij)  = 


a  *  Refkii,  j)+  (\-o^*Ciuk{i,  y)  otherwise 


if  there  is  a  target  at  pixe^/,  yj 


where  Ref^iU  j)  is  the  {ijf  pixel  of  the  A*  reference  image,  Curj^iU  j)  is  the  {Ujf  pixel  of  the  input  image,  and 
a  (o  <  a  <  l)  determines  the  rate  at  which  the  reference  image  is  updated.  Whether  or  not  there  is  a  target  at  a  par¬ 
ticular  pixel  is  determined  by  the  ATA  system’s  current  view  of  target  locations. 

A  bandpass  filter  is  next  applied  to  a  difference  image  which  is  formed  by  using  equation  (2.1 1).  This  filter  de¬ 
tects  differences  in  the  “structure”  of  the  input  images  which  point  to  likely  target  locations.  The  high-pass  section 
of  this  filter  reduces  the  effects  of  broad  illumination  changes,  while  the  low-pass  section  reduces  the  effects  of 
noise.  This  structural  difference  {SDiff)  is  calculated  by 


SDiffk[ij)  =  X  [  [i  +  mj  +  n)-  Diffj, +  n 


where 


DWk{h  y)  =  R^kii^  j) 


Diffk(‘j)  =  — r  X  +  m,J  +  n) 


*2  n,meR^ 


Ri  and  R2  are  regions  centered  around  (/j)  and  are  of  user-defined  size.  They  are  typically  bounded  by 
^—N  /  N  /  \  /  \ 

%  ’  %  ”  H  (^2)  where  Ni  and  N2  have  the  values  4  and  8,  respectively. 
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2.4.2  Locating  Potential  Targets 

If  one  examines  the  SDz^image  over  time,  certain  noisy  regions  of  the  image  (where  noise  is  due  to  local  mo¬ 
tion  such  as  moving  tree  limbs)  will  consistently  display  higher  values  than  quiet  areas.  Therefore,  a  consistent 
threshold  for  considering  an  area  to  be  “interesting”  may  give  false  alarms  on  noisy  areas  while  missing  real  targets 
over  quiet  areas.  To  deal  with  this  phenomenon,  a  dynamic  noise  image  {Noise],)  is  used.  In  order  for  a  given  area  to 
be  considered  “interesting,”  it  must  now  be  greater  than  the  noise  image  by  a  user-given  threshold  5.  This  noise 
image  has  a  different  value  for  each  pixel  and  is  determined  by 

,  ,  \Noisek[i,j)  if  target  at  pixel(/,y) 

j)  - ^ Noisek{i,  j)+[l-p)* SDiffk[i,  j)  otherwise 

(2.13) 


Once  again,  whether  or  not  there  is  a  target  at  pixel  (/j)  is  determined  by  the  ATA  system’s  current  view  of  the 
target  locations,  and  p  (o  <  ^  <  l)  is  user-defined.  A  binary  image  {Targ^)  is  then  formed  by 


if -  Nois^[i,j)  >  S 
otherwise 


(2.14) 


where  5  is  a  user-defined  parameter.  In  order  to  make  further  calculations  tractable  (the  object  of  this  system  is  to 
work  in  real  time),  Targk  is  sub-sampled  on  a  2D  grid  of  evenly  spaced  points.  This  sub-sampling  reduces  the 
amount  of  remaining  computation  without  significantly  affecting  the  system’s  ability  to  detect  targets. 


2.4.3  Target  Segmentation 

The  tracking  system  next  segments  the  image  that  results  from  the  sub-sampling  by  extracting  connected  com¬ 
ponents  [16].  The  segmentation  creates  a  set  of  symbolic  objects  which  represent  areas  of  the  image  which  are 
changing,  and  further  reduces  the  image  area  which  must  be  examined  to  locate  targets.  Each  of  these  objects  has  an 
associated  set  of  parameters  which  include  object  shape,  centroid,  area,  and  range.  These  objects  are  next  examined 
to  discriminate  between  changes  due  to  noise  and  clutter,  and  changes  due  to  true  targets. 

A  feature  which  is  not  yet  available  in  the  real-time  system,  but  which  may  be  run  in  the  computer  simulation, 
makes  an  attempt  to  separate  two  or  more  targets  that  may  be  overlapping  by  performing  an  optical-flow-like  algo¬ 
rithm  [17].  To  perform  this  analysis,  a  2D  grid  of  points  is  overlaid  on  top  of  the  object  in  question.  Next,  a  velocity 
vector  is  computed  for  each  of  these  grid  points.  These  velocity  vectors  are  grouped  into  areas  of  uniform  motion, 
and  the  object  is  then  resegmented. 

The  velocity  vector  is  computed  by  determining  the  motion  of  a  small  neighborhood  of  points  centered  on  each 
grid  point.  This  computation  is  accomplished  by  performing  a  template  matching  between  regions  in  the  current  im¬ 
age  and  the  previous  image: 
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(2.15) 


Vela  = 


min 


(  \ 

2 

X  (*’  j)  ~  ('■  +  IJ  +  «)) 

JJeR^  ) 


Equation  (2,15)  attempts  to  minimize  the  sum  of  squared  differences  over  a  region  R],  Region /?i  is  determined 

by  the  amount  a  target  in  region  would  move  in  a  single  jframe  time.  Once  the  velocities  for  each  grid  point  are 

computed,  similar  adjacent  velocity  vectors  are  grouped  into  regions  of  uniform  motion.  These  regions  of  uniform 
motion  are  then  used  to  resegment  the  set  of  objects  which  was  previously  generated.  This  resegmentation  will  break 
apart  overlapping  targets  that  are  moving  in  different  directions  or  at  different  speeds.  It  also  helps  to  eliminate  high- 
clutter  areas  since  they  will  usually  exhibit  random  motion. 

2.4.4  Target  Discrimination 

The  above  algorithms  generate  a  set  of  potential  targets.  The  tracker  must  decide  which  of  these  objects  repre¬ 
sent  true  targets  and  which  are  the  results  of  noise  and  clutter.  A  survey  of  various  multi-target  tracking  algorithms 
is  given  in  [18].  The  ARL  tracker  tracks  each  object  over  several  frames  before  the  determination  of  “target”  or 
“clutter”  is  made.  This  allows  the  system  to  make  a  more  informed  decision,  and  allows  for  better  estimation  of  ob¬ 
ject  properties. 

The  fundamental  problem  which  must  be  overcome  is  that  of  determining  the  frame-to-frame  correspondences 
of  objects.  ARL  uses  a  best-first  search  [19]  algorithm  to  determine  an  optimal  frame-to-firame  correspondence  of 
the  objects.  Target  properties,  such  as  velocity  and  range,  are  used  to  limit  the  set  of  possible  correspondences.  Ad¬ 
ditionally,  multiple  objects  may  be  matched  to  a  single  previous  object.  This  allows  for  the  fact  that  poor  segmenta¬ 
tion  may  cause  an  object  to  “break  apart”  into  multiple  objects.  The  goodness  of  any  particular  correspondence  is 
based  on  the  weighted  differences  of  the  object  properties.  The  result  of  the  correspondences  is  an  updated  list  of 
possible  targets.  Each  of  these  objects’  properties  is  then  examined  to  determine  if  the  object  is  exhibiting  “target- 
like”  behavior.  Factors  such  as  object  permanence,  velocity,  and  size  are  examined.  A  violation  in  any  one  area  is 
enough  to  rule  out  an  object’s  being  a  target.  If  the  object  passes  these  tests,  it  is  added  to  the  “target”  list.  Objects 
on  the  target  list  are  displayed  on  the  tracker’s  output  by  surrounding  them  with  white  boxes,  while  areas  which  ex¬ 
ceed  the  motion  threshold  are  shown  by  having  black  boxes  overlaid  on  them.  A  typical  output  frame  from  the 
tracker  is  shown  in  Figure  4. 

3.  Algorithm  Characterization 

The  goal  of  this  paper  is  to  evaluate  stabilization  algorithms  of  widely  vaiying  computational  complexities  in  a 
way  that  has  meaning  in  the  real  world.  The  classic  sum  of  mean  square  errors  (MSEs)  is  used  to  show  that  one  al¬ 
gorithm  offers  better  performance  in  the  mean  squares  sense  than  another  algorithm.  But  what  does  better  stabiliza¬ 
tion  in  the  mean  square  sense  mean  to  a  real-time  imaging  application?  How  much  error  is  acceptable?  Is  the  added 
expense  and  complexity  which  would  be  necessary  to  implement  the  more  computationally  expensive  algorithms 
necessary  or  worth  it?  To  perform  such  an  evaluation,  a  real-time  image  processing  system  needed  to  be  chosen  as 
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the  evaluator.  For  the  purpose  of  this  paper,  the  ARL  real-time  tracker  was  chosen  as  the  evaluator.  The  tracking 
system  was  run  on  the  output  of  each  stabilization  algorithm,  and  its  relative  performance  was  monitored.  The  rela¬ 
tive  tracker  performance  was  then  used  to  determine  which  algorithms  were  “good  enough”  to  be  used  as  a  front  end 
system  to  provide  image  stabilization  for  the  tracker. 


Figure  4:  Typical  tracker  output. 


3-1  ARL  T racker  Performance  Criteria 

The  output  of  the  ARL  tracking  system  consists  of  five  parameters:  target  number,  target  height,  target  width, 
target  velocity,  and  target  centroid.  During  proper  operation,  the  target  number  remains  constant  from  track  initia¬ 
tion  until  the  target  leaves  view  and  is  used  to  identify  the  remaining  parameters.  The  target  height,  target  width, 
target  velocity,  and  target  centroid  are  used  by  applications  that  use  the  tracking  system  as  a  front-end  preprocessor. 
These  systems  include  weapon  control  systems  and  automatic  target  recognition  systems.  For  the  purpose  of  this 
evaluation,  it  was  decided  to  evaluate  tracker  performance  based  on  the  accuracy  of  these  parameters,  along  with 
false  alarm  rate  and  time  to  acquire  target  from  first  sighting.  The  baseline  for  this  evaluation  was  created  by  ob¬ 
serving  tracker  performance  on  a  sequence  obtained  from  a  stationary  forward  looking  infrared  imager  (FLIR).  One 
synthetic  non-stable  FLIR  image  sequence  and  two  real  non-stable  FLIR  image  sequences  were  then  run  through 
each  stabilization  algorithm  for  evaluation.  Both  real  sequences  used  for  evaluation  were  100  frames  in  length,  taken 
at  a  frame  rate  of  ten  frames  per  second. 
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Table  1:  Data  Set  1  Image  Transfonn  Coordinates 


Absolute  Shift 

1  Shift  From  Frame  AH 

Absolute  Shift 

1  Shift  From  Frame /V- 1  | 

Frame# 

Row 

Col 

Angle 

Row 

Col 

Angle 

Frame# 

Row 

Col 

Angle 

Row 

Col 

Angle 

1 

0.0 

0.0 

0.0 

- 

- 

- 

19 

110 

15.0 

0.0 

5.C 

5.0 

OX 

2 

0.1 

0.0 

ao 

0.1 

0.0 

ao 

20 

170 

9.0 

0.0 

6.0 

-6.0 

0.0 

3 

0.3 

0.0 

0.0 

02 

0.0 

0.0 

21 

10O 

20 

0.0 

-7.0 

-7.0 

00 

4 

0.7 

ao 

0.C 

0.4 

ao 

ao 

22 

20 

■6.0 

1.0 

-8.0 

-8.0 

IX 

5 

20 

OD 

ox 

ia 

ao 

ox 

23 

20 

-6.0 

ao 

OX 

OO 

2C 

6 

4.0 

0.0 

OjO 

20 

0.0 

ao 

24 

20 

-6.0 

ao 

0.0 

0.0 

3.0 

7 

7.0 

ao 

OX) 

ao 

ao 

00 

25 

20 

■6.0 

lao 

0.0 

0.0 

4.0 

8 

1W 

0.0 

OJO 

40 

ao 

0.0 

25 

20 

-6.0 

50 

0.0 

0.0 

-5X 

9 

16.0 

OX) 

0.0 

ao 

0.0 

ox 

27 

20 

-6.0 

4.0 

OX 

00 

-9.C 

10 

10.0 

0.0 

0.0 

■6.0 

0.0 

ao 

28 

20 

■6.0 

ao 

00 

0.0 

7.0 

11 

ao 

oo 

ao 

-7.0 

0.0 

00 

29 

20 

-6.0 

11.0 

0.0 

OO 

8.0 

12 

^.0 

0.0 

OX) 

-8.0 

ao 

ao 

30 

20 

■6.0 

20 

00 

00 

-9.C 

13 

-14.0 

0.0 

ox 

-9.0 

0.0 

0.0 

31 

20 

-60 

■8.0 

0.0 

00 

-10X 

14 

-4.0 

ao 

0.0 

10.0 

ao 

ao 

32 

20 

-6.0 

1.0 

0.0 

0.0 

90 

15 

-3.0 

1.0 

0.0 

10 

10 

0.0 

X 

20 

-6.0 

ao 

00 

00 

20 

16 

-1.0 

ao 

0.0 

20 

20 

ao 

34 

4.0 

4.0 

ao 

20 

20 

3.0 

17 

20 

ao 

ox 

3.0 

3.0 

ox 

36 

70 

-1.0 

10.0 

3.0 

3.0 

4.C 

18 

6.0 

100 

0.0 

40 

4.0 

0.0 

36 

11.0 

30 

ao 

40 

40 

-5.C 

The  baseline  FLIR  sequence  was  obtained  from  a  FLER  imager  moimted  on  a  fixed  tripod.  During  this  image 
sequence,  five  targets  are  observed.  The  synthetic  image  sequence  consists  of  a  single  background  image  which  has 
a  target  image  superimposed  on  it.  The  generated  sequence  is  then  operated  on  by  an  affine  transform  to  simulate 
image  instability.  Translations  as  well  as  rotations  are  simulated  as  shown  in  Table  1.  The  simulated  image  shifts 
ranged  in  value  from  subpixel  shifts  to  instantaneous  shifts  of  8  pixels  in  both  the  row  and  column  directions.  Rota¬ 
tional  shifts  were  simulated  in  a  range  from  1  to  10  degrees.  Finally,  both  shifts  and  rotations  were  simulated  simul¬ 
taneously.  Real  sequence  Nos.  1  and  2  consist  of  real  FLIR  imagery  collected  from  a  tripod-mounted  FLIR  imager. 

For  real  sequence  No.  1,  the  imager  was  panned  horizontally  and  vertically  to  obtain  image  rotations  around  the 
X  and  y  axes.  During  these  image  rotations,  five  targets  are  observed.  For  real  sequence  No.  2,  the  imager  was 
panned  horizontally  and  vertically,  as  well  as  rotated.  This  allowed  for  the  collection  of  data  that  contained  image 
rotations  around  all  three  image  axes.  Dming  real  sequence  No.  2,  four  taigets  were  observed. 

3.2  Performance  Baseline 

The  MSEs  between  frames  and  between  frame  1  and  the  current  frame  for  the  baseline  image  sequence  are 
shown  in  Figure  5.  For  the  purpose  of  this  p^er,  MSE  is  defined  as  the  square  root  of  the  sum  of  the  squares  of  the 
pixel  differences  in  the  image  set: 
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While  MSE  is  not  the  measure  by  which  this  paper  is  judging  stabilization  algorithms,  it  does  offer  some  insight  into 
the  stability  of  the  image  sequence.  The  MSE  between  frames  may  be  viewed  as  a  measure  of  how  rapidly  the  image 
sequence  is  changing,  and  the  MSE  between  frame  1  and  the  current  frame  may  be  viewed  as  a  measure  of  how 
much  the  current  image  has  drifted  away  from  the  first  or  reference  image.  As  would  be  expected  for  a  stable  se¬ 
quence,  the  MSE  remains  relatively  constant,  with  the  frame-to-fi^e  MSE  slightly  lower  than  the  reference  frame 
to  current  frame  MSE. 


Stable  Data  Set 


'  Frame  1  with  Current 
Frame 

- -  Frame(n)with 

Frame(n+1) _ 


Figure  5:  Baseline  MSE  performance. 

During  the  image  baseline  sequence,  five  targets  were  observed.  The  average  time  for  a  target  to  be  detected 
and  tracked  was  1.5  finmes  from  full  target  view  (frame  when  the  target  comes  into  full  view  of  the  tracker)  or  5 
frames  from  first  target  appearance  (frame  when  any  part  of  the  target  becomes  visible).  The  reason  for  the  short 
time  between  full  target  view  and  target  tracking  is  that  the  tracker  has  already  started  to  anticipate  a  target  track  by 
the  time  the  target  is  in  full  view.  The  tracker  maintained  a  consistent  track  ID  on  all  targets  throughout  the  target 
visibility  period,  and  maintained  a  target  box  that  was  larger  than  the  true  target  size  of  the  target  by  at  most  60%, 
with  the  centroid  of  the  box  on  the  centroid  of  the  target.  The  tracker  normally  maintains  a  tracking  box  that  is  larger 
than  the  predicted  target  size  to  ensure  that  the  full  target  is  segmented  and  passed  to  other  applications.  Note  that  on 
the  baseline  run,  the  tracker  performed  a  “target  hand-off,”  a  phenomenon  that  sometimes  occurs  when  one  target  is 
exiting  a  region  of  the  scene  as  another  is  simultaneously  entering.  The  tracker  may  then  mistakenly  hand  off  the 
target  id  of  the  exiting  target  to  the  new  target. 

The  tracking  threshold  for  this  image  set  was  set  at  a  value  of  4.  The  tracking  threshold,  an  important  measure 
of  tracking  sensitivity,  represents  the  average  change  that  must  occur  in  an  image  region  for  a  change  detection 
alarm  to  be  triggered.  Misaligned  images  will  tend  to  appear  noisier  due  to  frame-to-frame  image  motion.  This  ne¬ 
cessitates  raising  the  tracking  threshold  and  thus  lowering  the  tracking  sensitivity. 
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3.3  Results  From  Synthetic  Data  Set  1 

As  shown  in  Figure  6,  the  between-frame  MSE  is  low  for  all  three  algorithms  until  the  instantaneous  row  and 
column  shift  is  greater  than  4  (finme  18,  see  Table  1).  At  this  point,  the  projection  method  breaks  down  and  is  no 
longer  able  to  correct  for  any  of  the  larger  shift  values.  Both  feature  tracking  algorithms  perform  very  well  on  all 
shift  sequences  tried. 


Figure  6:  Synthetic  data  set  1,  MSE  frame  n  to  frame  n+\ . 


The  projection  method  immediately  breaks  down  when  it  is  faced  with  trying  to  compensate  for  image  rota¬ 
tions.  As  shown  in  Figure  6,  this  method  provides  a  very  slight  improvement  in  MSE  over  die  raw  image  sequence. 
Once  again,  both  feature  tracking  algorithms  perform  well  when  compensating  for  both  rotations  and  simultaneous 
translation/rotation.  Both  break  down  for  instantaneous  rotations  of  over  9-10  degrees. 

Figure  7  shows  the  MSE  of  the  current  fimne  compared  to  the  reference  frame.  Note  that  both  the  projection 
and  feature  tracking  2  algorithms  exhibit  very  low  MSE  until  the  algorithms  break  down.  Feature  tracking  algorithm 
1  exhibits  a  noticeable  drift  in  the  MSE.  This  drift  indicates  that  the  subpixel  matching  is  not  fully  compensating  for 
all  of  the  error  and  that  the  error  is  accumulating. 

The  next  item  examined  was  the  actual  tracker  performance  on  each  of  the  stabilized  sequences  which  is  shown 
in  Table  2.  For  all  the  data  sets,  the  tracker  was  run  with  a  threshold  of  9,  which  gave  the  best  level  of  detection 
while  eliminating  false  alarms. 
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Manual  Data  Set 

MSE  of  Frame  1  with  Current  Frame 


— .  -  -  —  .Raw  Image 

- Peat.  Trk.  1 

- Feat  Trk.  2 

- Projection 


Figure  7:  Synthetic  data  set  1,  MSE  reference  frame  to  current. 

Both  the  projection  algorithm  and  FTA  2  experienced  no  false  alarms  while  they  maintained  valid  tracks.  When 
examined  over  the  entire  sequence,  FTA  1  experienced  an  average  of  one  false  alarm  per  frame.  Appendix  A  shows 
the  actual  error  in  each  of  the  algorithms.  Even  though  fliis  appendix  shows  that  FTA  2  performed  the  best  in  stabi¬ 
lizing  the  image  sequence,  the  projection  algorithm  appears  to  allow  for  better  segmentation  through  frame  16. 
When  the  tracker  was  run  on  the  output  of  both  FTA  1  and  FTA  2,  it  appears  that  the  single  target  was  incorrectly 
segmented  into  two  objects.  However,  appearances  may  be  deceiving.  This  apparently  incorrect  segmentation  is  due 
to  parts  of  the  target  being  the  same  temperature  as  the  background  through  which  the  target  is  traveling.  With 
proper  stabilization,  the  separated  target  regions  appear  to  be  two  separate  entities.  When  the  projection  algorithm 
was  used,  there  remained  enough  uncorrected  motion  diat  the  two  motion  regions  blurred  into  a  single,  correctly 
sized  region. 

Table  2:  Tracker  performance  on  synthetic  data  set  1. 


Algoriflim 

.  First  jKame 
T»ge(  Tracked 

;  .Lqsi»  S^mentafron 

:  Loses  Track 

Projection 

6 

14 

21 

FTAl 

7 

32 

32 

FTA  2 

7 

13 

24 

3.4  Results  From  Real  Data  Set  1 

Real  data  set  1  contains  imagery  with  rotations  around  the  x  andy  axes.  Seen  in  Figure  8  all  three  algorithms 
performed  with  approximately  the  same  MSE  when  frame  n  is  compared  to  frame  «+l.  The  main  distinguishing 
characteristic  is  that  the  projection  algorithm  suffered  from  several  grossly  misaligned  frames  (as  seen  from  the 
large  spikes  in  the  graph). 
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Figure  8:  Real  data  set  1,  MSE,  frame  n  to  n+\. 

Figure  9  shows  that  the  projection  algorithm  also  suffered  misalignment  fr'om  the  reference  frmne  in  several  re¬ 
gions.  In  the  early  part  of  the  image  sequence,  this  misalignment  did  not  adversely  affect  the  tracker  performance. 
This  may  be  the  result  of  the  tracker’s  adaptive  reference  image.  Since  the  early  projection  errors  presented  a  step 
function,  the  reference  image  was  able  to  adapt  to  the  new  level  and  the  tracker  was  able  to  continue  to  perform 
well.  However,  later  in  the  sequence,  the  projection  algorithm  drifted  out  of  alignment.  This  drift  presented  the 
tracker  with  a  stabilized  input  that  had  a  ramp  in  addition  to  the  step  present  in  the  MSE  (see  Figure  9,  frame  37  on). 
The  tracker  was  unable  to  perform  in  the  presence  of  this  MSE  ramp.  When  the  tracker  threshold  was  set  to  a  value 
of  17,  die  tracker  experienced  no  false  alanns  until  frame  39  (a  point  slightly  after  the  beginning  of  the  ramp).  After 
that  point,  the  tracker  averaged  1.5  false  alarms  per  frame  and  had  difficulty  in  maintaining  consistent  track  identifi¬ 
ers  on  the  targets. 

FTA  1  also  had  a  problem  with  drift  in  the  stabilized  image  sequence.  This  drift  may  be  seen  as  the  ramp  in  the 
MSE  of  Figure  9.  In  fact,  this  drift  made  the  tracking  system  unable  to  use  this  sequence  as  input.  As  shown  in 
Figure  10,  when  the  tracking  threshold  was  set  low  enough  to  detect  the  vehicle’s  motion,  the  entire  roadway 
showed  up  as  a  motion  region.  This  prevented  the  tracker  from  performing  proper  segmentation  and  establishing  a 
track.  When  the  threshold  was  raised  enough  to  prevent  the  roadway  from  being  tagged  as  a  motion  region,  the  vehi¬ 
cle  failed  to  be  detected.  In  both  cases,  numerous  false  alarms  were  generated. 
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Figure  9:  Real  data  set  1,  MSE  current  frame  to  reference  frame. 


(a)  Output  of  tracker,  threshold  15.  (b)  Output  of  tracker,  threshold  20. 

Figure  10:  FTA  1  performance  on  real  data  set  1 . 


FTA  2  allowed  the  tracker  to  perform  very  well  against  this  data  set.  In  fact,  when  the  algorithm  was  running  at 
a  threshold  of  12,  every  target  was  detected  within  5  frames,  and  the  average  time  to  detect  a  target  from  full  view 
was  3.3  frames.  Lower  thresholds  increased  the  false  alarm  rate  without  significantly  improving  the  target  detection 
speed  or  segmentation.  Over  the  entire  100-frame  data  set,  only  two  frames  contained  a  single  false  alarm  each.  The 
segmentation  resulting  from  this  algorithm  was  good,  although  the  target  box  was  on  average  only  80%  of  the  size 
of  the  actual  target.  TTiis  size  may  be  accounted  for  by  the  higher  threshold,  which  was  necessary.  This  higher 
threshold  lessens  the  sensitivity  of  the  tracker,  and  could  cause  it  to  miss  faint,  outer  edge  target  pixels.  Table  3 
summarizes  the  tracker  performance  for  this  data  set. 
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Table  3;  Tracker  performance  on  real  data  set  1. 


3.5  Results  From  Real  Data  Set  2 

Real  data  set  2  involved  imageiy  that  had  rotations  around  all  three  image  axes.  As  may  be  seen  from  Figure  1 1, 
all  the  algorithms  perfonned  inconsistently  with  respect  to  frame-to-fiame  MSB.  Figure  12  shows  signifirant  drift  in 
all  the  algorithms.  The  tracker  was  unable  to  track  any  targets  on  the  ou^ut  from  the  projection  algorithm  and  FTA 
1.  In  fact,  the  projection  algorithm  failed  to  provide  any  noticeable  (to  the  human  eye)  improvement  in  the  image 
sequence.  Performance  on  the  output  of  FTA  2  was  the  best  of  the  three  algorithms  characterized.  With  a  tracking 
threshold  of  12,  the  tracker  required  between  1  and  13  frames  with  an  average  of  7  frames  to  acquire  targets  once 
they  were  in  full  view.  Although  the  tracker  was  able  to  detect  and  track  all  the  targets,  segmentation  was  poor,  widi 
an  average  of  only  67%  of  the  targets  being  segmented.  False  alarms  were  the  main  problem  encountered  in  using 
the  ou^ut  of  FTA  2,  with  a  maximum  of  five  false  alanns  in  a  single  flame  and  an  average  of  one  false  alarm  per 
flame.  These  results  are  summarized  in  Table  4. 


Figure  11:  Real  data  set2,  MSB  flamento  flame  «+l. 
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Table  4:  Tracker  performance  on  real  data  set  2. 


Algbiitte 

:%  targets 
detseted 

\  #  Jame$to 
acquire  target 

/-<Ayemgefalse 
,  .aJamis/fifame 

,  :%  targets  : 
sc|?toited  : 

Projection 

17 

0 

NA 

NA 

NA 

FTA  1 

12 

0 

NA 

NA 

NA 
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12 

100 

7 

1 

67 
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Figure  12:  Real  data  set  2,  MSE  current  frame  to  reference  frame. 


4.  Conclusions 

4.1  Projection  Algorithm 

The  ARL  projection  algorithm  represents  the  simplest  of  the  stabilization  algorithms  evaluated.  It  can  compen¬ 
sate  for  image  translations  to  the  nearest  pixel.  No  subpixel  matching  or  rotation  correction  is  attempted.  For  both 
synthetic  data  set  1  and  real  data  set  2,  the  algorithm  performed  well  enough  to  allow  the  tracker  to  detect  the  ma¬ 
jority  of  the  targets  with  consistent  tracks.  The  algorithm  is  unable  to  correct  for  rotations  or  scale  changes,  and 
therefore  did  not  significantly  stabilize  real  data  set  2. 

The  largest  drawback  of  this  algorithm  was  the  number  of  false  alarms  generated.  These  false  alarms  were  pres¬ 
ent  even  with  high  tracker  detection  thresholds,  showing  that  subpixel  matching  of  the  image  sequence  is  necessary 
for  proper  tracker  performance.  Overall,  it  was  found  that  this  algorithm  would  be  unacceptable  as  a  front-end  sys¬ 
tem  for  the  tracker. 
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4.2  FTA1 


FTA  1  represents  a  significant  increase  in  computational  complexity  over  the  projection  algorithm.  This  algo¬ 
rithm  was  designed  to  operate  on  a  moving  platform  and  to  eliminate  high-frequency  image  movement  due  to  rough 
terrain.  As  such,  this  algorithm  stabilizes  frame  n  to  frame  «+l .  When  this  algorithm  is  used  to  evaluate  the  current 
frame  against  a  slowly  changing  reference  frame,  extremely  accurate  subpixel  matching  must  be  present  to  ensiue 
that  errors  do  not  propagate  as  the  image  sequence  progresses.  Unfortunately,  real-world  considerations,  such  as 
pixel  blur  due  to  camera  motion,  illumination  changes,  and  perspective  distortions,  make  such  accurate  matching 
nearly  impossible.  As  shown  in  previous  sections,  this  algorithm  greatly  reduces  the  mean  square  error  of  fimne  n 
relative  to  frame  n+\  as  compared  to  the  raw  sequence.  In  fact,  the  output  image  sequence  looks  very  stable  to  the 
human  eye.  However,  the  drift  that  is  present  when  one  is  viewing  the  reference  frmne  compared  to  the  current 
frmne  is  significant  enough  that  the  tracker  is  unable  to  detect  and  track  targets  in  the  image  sequences.  The  tracker 
performance  shows  that  imagery  that  appears  stable  to  a  human  observer  may  still  have  enough  motion  artifacts  re¬ 
maining  that  a  computer  does  not  accept  ftie  imagery  as  stable.  Some  of  the  error  in  stabilization  may  possibly  be 
eliminated  by  a  more  sophisticated  feature  detection  algorithm.  The  ctirrent  feature  detection  scheme  was  found  to 
pick  features  that  lie  on  power  lines  (an  object  that  makes  feature  rejection  very  difficult)  and  moving  objects  like 
tree  limbs  (causing  the  algorithm  to  compensate  for  tree  motion  instead  of  camera  motion).  Stable,  unique  features 
which  exhibit  sharp  edges  and  high  contrast  would  allow  the  feature  tracking  algorithm  and  stabilization  to  be  more 
accurate.  Overall,  this  algorithm  would  not  be  useful  as  a  front  end  processor  for  the  tracking  system. 

4.3  FTA  2 

FTA  2  is  based  on  FTA  1,  with  the  addition  of  an  extra  step  to  realign  each  flume  with  the  reference  frame.  As 
may  be  seen  from  Table  5,  the  performance  difference  between  this  algorithm  and  FTA  1  is  very  small.  The  average 
correction  between  the  two  algorithms  is  on  the  order  of  several  hundredths  of  a  pixel.  However,  this  small  differ¬ 
ence  is  very  important  to  the  tracker.  The  felse  alarm  rate  drops  to  zero,  and  the  percentage  of  targets  detected  in¬ 
creases  to  100%.  These  figures  clearly  show  the  need  for  accurate  subpixel  matching  in  computer-viewed  imagery. 


Table  5:  Average  differences  between  FTA  2  and  the  other  two  algorithms. 


•  Row' 

,  jdphnnn 

:  Rmafion 

iScde 

FTA1-FTA2 

0.38 

023 

0.10 

0.00 

Projection-FTA2 

0.59 

0.30 

0.06 

0.00 

While  operating  on  real  data  set  1,  the  tracker  showed  no  significant  performance  loss  over  the  stable  data  set. 
This  performance  shows  that  for  unstable  data  rotations  around  the  x  and  y  axes,  the  extra  computational  cost  of 
FTA  2  is  necessary  for  proper  operation.  When  working  against  data  which  involved  rotations  around  all  three  axes 
(up  to  =  45  degrees  of  total  rotation  around  the  z  axis),  the  algorithm  did  not  fare  as  well.  Further  investigation  is 
necessary  to  determine  the  cause  of  the  algorithm’s  breakdown,  and  exactly  what  range  of  rotation  the  algorithm  is 
capable  of  compensating  for.  Overall,  this  algorithm  would  make  a  very  good  front  end  pre-processor  for  a  modified 
ARL  system  which  would  allow  for  the  scanning  of  a  large  tracking  area. 
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4.4  Additional  Research 

This  pqper  has  used  only  stabilization  algorithms  which  have  public  domain  simulations  available.  To  be  truly 
complete,  commercial  algorithms,  such  as  the  one  presented  in  Appendix  C,  which  are  proprietary  in  nature,  must 
also  be  studied. 

This  paper  has  used  only  three  FUR  data  sets  to  evaluate  the  algorithms.  Additional  data  sets  that  provide  scale 
changes  as  well  as  more  combinations  of  instabilities  need  to  be  examined.  In  addition,  the  performance  of  the  algo¬ 
rithms  against  imagery  from  other  sensors  (video,  laser  radar,  synthetic  aperture  radar)  needs  to  be  characterized. 
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Appendix  A.  Algorithm  Errors  on  Synthetic  Data  Set  1 


1  II  Feature  Tracking  Algorithm  1 

1  Feature  Track 

ing  Algorithm  2 

1  Projection  Algorithm  | 

Frame# 

Row 

Col 

Angle 

Scale 

Row 

Col 

Angie 

Scale 

Row 

Col 

Angle 

1 

0.01 

0.01 

0.00 

0.00 

0.02 

0.02 

0.00 

0.00 

0.00 

0.00 

0.00 

2 

0.07 

0.16 

0.07 

0.00 

0.02 

0.21 

0.08 

0.00 

0.10 

0.00 

0.00 

3 

0.65 

0.12 

0.12 

0.00 

0.05 

0.11 

0.09 

0.00 

0.30 

0.00 

0.00 

4 

1.53 

0.12 

0.06 

0.00 

0.12 

0.10 

0.04 

0.00 

0.70 

0.00 

0,00 

5 

2.38 

0.23 

0.05 

0.00 

0.27 

0.29 

0,02 

0.00 

0.00 

0.00 

0.00 

6 

2.61 

0.38 

0.03 

0.00 

0.25 

0.03 

0.05 

0.00 

1.00 

0.00 

0.00 

7 

2.95 

0.35 

0.04 

0.00 

0.10 

0.06 

0.01 

0.00 

1.00 

0.00 

0.00 

8 

3.29 

0.32 

0.07 

0.00 

0.07 

0.15 

0.05 

0.00 

1.00 

0.00 

0.00 

9 

3.89 

0.40 

0,21 

0.00 

0.14 

0.16 

0.01 

0.00 

8.00 

0.00 

0.00 

10 

4.24 

0.42 

0.25 

-0,01 

0,18 

0.18 

0.02 

0,00 

8.00 

0.00 

0.00 

11 

5.12 

0.41 

0.27 

-0.01 

0.19 

0.32 

0.05 

0.00 

7.00 

0.00 

0.00 

12 

5.59 

0.49 

0.19 

-0.01 

0.10 

0.20 

0.01 

0.00 

6.00 

0.00 

0.00 

13 

5.93 

0.50 

0.12 

-0.01 

0.18 

0.12 

0.04 

0.00 

11.00 

0.00 

0.00 

14 

6.27 

0.57 

0.13 

-0.01 

0.19 

0.12 

0.04 

0.00 

1.00 

1.00 

0.00 

15 

6.59 

0.74 

0.16 

-0.01 

0.22 

0.17 

0.05 

0.00 

2.00 

1.00 

0.00 

16 

6.21 

0.02 

0.01 

-0.01 

0.07 

0.17 

0.00 

0.00 

2.00 

1.00 

0.00 

17 

7.13 

0.00 

0.04 

-0.01 

0.18 

0.19 

0.03 

0.00 

2.00 

1.00 

0.00 

18 

7.80 

0.08 

0.00 

-0.01 

0.25 

0.19 

0.06 

0.00 

1.00 

1.00 

0.00 

19 

8.06 

0.04 

0.04 

-0.01 

0.15 

0.10 

0.04 

0.00 

2.00 

9.00 

0,00 

20 

8.55 

0.41 

0.11 

-0.01 

0.06 

0.08 

0.03 

0.00 

9.00 

8.00 

0.00 

21 

9.15 

0.63 

0.17 

0.00 

0.24 

0.19 

0.00 

0.00 

8.00 

7.00 

0.00 

22 

9.16 

1.04 

0.73 

0.00 

0.11 

0.05 

1.04 

0.00 

2.00 

2.00 

1.00 

23 

8.62 

0.48 

1.86 

0.00 

1,14 

0.71 

2,11 

0.00 

6.00 

3.00 

3.00 

24 

9.09 

0.04 

2.39 

0.00 

1.87 

2.19 

3.01 

0.00 

11.00 

4.00 

6.00 

25 

7.22 

1.79 

3.65 

0.00 

3.54 

4.48 

3.94 

0.00 

18.00 

8.00 

10.00 

26 

5.02 

4.33 

5.33 

0.00 

6.14 

7.80 

4.95 

0.00 

9.00 

2.00 

5.00 

27 

8.34 

1.54 

9.27 

0.00 

2.82 

3,66 

9.07 

0.00 

9.00 

4.00 

-4.00 

28 

17.56 

1.45 

6.98 

-0.01 

4.23 

2.24 

7.26 

0.00 

6.00 

2.00 

3.00 

29 

14.44 

1.69 

8.19 

-0.01 

1.57 

1  2.53 

8,31 

0.00 

19.00 

8.00 

11.00 

30 

10.16 

6.48 

8.47 

0.00 

5.76 

8.51 

9.19 

0.00 

4.00 

2,00 

2.00 

31 

15.43 

1.93 

8.88 

0.00 

0.00 

1.19 

10.26 

0.00 

25.00 

1.00 

-8.00 

32 

26.48 

5.58 

9.50 

0.01 

11.77 

0.85 

7.69 

0.01 

2.00 

2.00 

1.00 

33 

23.14 

8.72 

2.90 

0.01 

7.97 

6.47 

0.48 

0.01 

5.00 

3.00 

3.00 

34 

23.54 

8.53 

3.53 

0.01 

9.94 

6.98 

0.30 

0.01 

11.00 

3.00 

6.00 

35 

22.39 

8.84 

4.58 

0.02 

9.22 

8.07 

1.02 

0.02 

22.00 

3.00 

10.00 

36 

19.97 

9.39 

4.29 

0.02 

6.22 

9.81 

8.11 

0.03 

12.00 

2.00 

5.00 

Table  units  are  pixels  for  row  and  coliunn  shifts,  degrees  for  angle  shifts,  and  percent  for  scale  changes. 


The  row,  column,  and  angle  values  in  the  table  in  Appendix  A  were  computed  by  determining  the  absolute 
value  of  the  difference  between  the  actual  image  motion  (as  given  in  Table  1)  and  the  motion  reported  by  the  various 
algorithms.  The  scale  value  is  the  actual  scale  value  (100%  for  the  entire  sequence)  minus  the  reported  scale  value. 
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Appendix  B.  Difference  Between  FTA  1  and  FTA  2 


Frame 

Column 

Rotation  I  Scale  |  Frame 

Column 

Rotation 

Scale 

1 

0.00 

0.00 

EIsEliiia 

51 

EES 

-0.01 

0.33 

99.70% 

2 

0.17 

-0.03 

99.93% 

52 

E@ 

0.19 

-0.05 

99.88% 

3 

-0.49 

0.16 

0.03 

99.57% 

53 

-0.82 

0.16 

-0.03 

100.64% 

4 

0.47 

-0.31 

-0.08 

54 

0.60 

-0.37 

-0.06 

99.50% 

5 

-0.13 

0.07 

100.06% 

55 

EIS 

-0,07 

100.05% 

6 

0.17 

0.02 

99.73% 

56 

EBl 

-0,01 

100.07% 

7 

-0.04 

0.05 

100.03% 

57 

M)43 

-o.os 

0.03 

100,22% 

8 

KES 

0.14 

-0.16 

100.33% 

58 

ElB 

-0.43 

0.14 

100.25% 

9 

-0.25 

-0.10 

-0.01 

99.80% 

I  0-25 

-0.41 

0.42 

99.63% 

10 

-0.31 

0.16 

0.12 

100.08% 

60 

EES 

0.32 

-0.07 

100.20% 

11 

-0.14 

100.01% 

61 

0.86 

0.06 

0.04 

100.44% 

12 

KES 

0.13 

0.06 

99.92% 

62 

1.06 

-0.1C 

0.09 

99.89% 

13 

-0.44 

-0.12 

0.09 

99.66% 

63 

0.81 

-0.38 

0.20 

99.48% 

14 

HiK«j 

0.19 

0.01 

99.81% 

64 

0.77 

-0.11 

-0.08 

100.07% 

15 

0.03 

-0.22 

99.99% 

65 

0.39 

-0.40 

0.09 

99.60% 

16 

-0.39 

-0.17 

-0.14 

99.83% 

66 

0.67 

-0.08 

0.21 

100.39% 

17 

BjH 

0.17 

99.92% 

67 

BES 

0,16 

100.32% 

18 

-0.53 

0.10 

99.44% 

68 

-0.14 

0.27 

-0.16 

99.93% 

19 

-0.67 

0.37 

-0.08 

100.23% 

69 

0.40 

0.96 

-0.12 

99.81% 

20 

11^ 

1.54 

-0.06 

100.73% 

70 

0.09 

0.10 

100.09% 

21 

-1.37 

1.77 

0.05 

71 

-0.81 

-0.18 

0.10 

99.60% 

22 

-0.02 

0.01 

100.06% 

72 

EES 

0.12 

-0,09 

100.10% 

23 

0.16 

0.37 

99.75% 

73 

eed 

-0.03 

0.32 

100.38% 

24 

0.13 

-0.21 

-0.10 

99.99% 

74 

1  107 

0.17 

100.64% 

25 

Mini 

0.23 

0.05 

100.47% 

75 

0.11 

0.17 

100.20% 

26 

MtIH 

0.14 

0.12 

100.11% 

76 

0.30 

-0,09 

100.00% 

27 

-0.23 

0.11 

100.42% 

77 

1  0.05 

-0.16 

-0.08 

100.10% 

28 

-0.12 

0.17 

-0.03 

100.03% 

78 

EES 

0.02 

0.22 

99.75% 

29 

EES 

-0.22 

-0.09 

100.10% 

79 

-0.35 

0.11 

99.81% 

30 

0.17 

0.15 

100.11% 

80 

-0.20 

0.02 

-0.18 

99.94% 

31 

0.16 

0.04 

-0.06 

99.64% 

81 

0.07 

0.12 

-0.04 

100.20% 

32 

0.35 

-0.16 

99.91% 

82 

-0.06 

0.14 

99.87% 

33 

KE3 

0.23 

-0.15 

100.05% 

83 

ebi 

-0.13 

0.02 

100.01% 

34 

-0.12 

0.10 

99.95% 

84 

EE3 

-0.01 

-0.16 

100.27% 

35 

0.41 

-0.05 

99.92% 

85 

1  0.62 

-0.01 

100.05% 

36 

0.06 

-0.04 

100.37% 

86 

0.22 

99.87% 

37 

-0.36 

0.08 

100.01% 

87 

0.06 

0.00 

100.05% 

38 

Bsy 

0.35 

0.04 

99.90% 

88 

0.58 

0.09 

-0.18 

100.12% 

39 

KE9 

-0.19 

0.00 

99.91% 

89 

-0.09 

-0.11 

-0.10 

99.88% 

40 

UiMij 

0.28 

-0.06 

99.97% 

90 

0.21 

0.16 

-0.01 

100.10% 

41 

0.07 

0.12 

0.13 

99.76% 

91 

-0.74 

0,14 

99.88% 

42 

-0.06 

-0.03 

0.08 

100.06% 

92 

0.08 

-0.05 

100.08% 

43 

-0.35 

-0.05 

93 

-0.06 

0.09 

100.09% 

44 

0.22 

-0.13 

99.95% 

94 

0.36 

-0.03 

100.15% 

45 

gilcW 

0.47 

0.01 

100.06% 

95 

EES 

-0.22 

-0.12 

99.86% 

46 

0.05 

0.08 

99.68% 

96 

eie 

0.02 

-0.01 

100,20% 

47 

Miwa 

-0.16 

0.08 

99.75% 

97 

eed 

0.17 

0.01 

99.78% 

48 

EltM 

-0.22 

-0.01 

98 

0.30 

0.14 

0.00 

100.08% 

49 

EKS 

0-59 

-0.12 

99.92% 

99 

-0.04 

-0.24 

-0.01 

99.66% 

50 

1  0.74 

-0.19 

0.02 

100.30% 

100 

0.09 

0.06 

0.05 

100.08% 

Rows  and  columns  are  in  pixels,  angle  is  in  degrees,  and  scale  is  percent  of  FTA  1  scale. 


The  values  in  the  table  in  Appendix  B  represent  the  additional  changes  made  to  each  image  by  the  FTA  2  algo¬ 
rithm  over  the  FTA  1  algorithm.  For  example,  frame  2  was  shifted  an  additional  -0.03  pixels  in  row  and  an  addi¬ 
tional  0.17  pbcels  in  colunrn,  and  was  rotated  an  additional  -0.03  degrees  over  the  FTA  1  output.  In  addition  to  these 
coirections  for  frame  2,  the  FTA  2  algorithm  reduced  its  output  to  be  99.93%  of  the  scale  of  the  FTA  1  output. 
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Appendix  C.  Additional  Stabilization  Algorithm 

An  additional  commercial  stabilization  algorithm  was  developed  by  the  David  Samoff  Research  Center.  This 
algorithm  is  an  mnovative  extension  to  an  optical  flow  technique  which  uses  image  pyramids  to  estimate  motion 
regions  in  images.  An  assumption  fundamental  to  most  optical  flow  algorithms  is  that  motion  at  any  one  point  may 
be  represented  by  a  simple  translation  [17,  20].  It  is  assumed  that  even  a  complex  motion  will  appear  as  a  uniform 
translation  when  viewed  through  a  sufficiently  small  window.  Bergen  et  al.  [6]  found  that  this  assumption  was  not 
valid  along  the  boundary  between  two  differently  moving  image  regions. 

Bergen  proposed  an  alternative  formulation  to  the  traditional  single  motion  component  optical  flow  algorithm 
[10, 21,  22, 23].  This  algorithm  allows  for  two  distinct  patterns  to  be  undergoing  affine  motion  within  a  given  local 
analysis  region.  The  algorithm  is  iterative,  estimating  a  region’s  motion,  then  removing  that  region  through  a  milling 
procedure.  This  allows  for  a  more  precise  estimation  of  the  remaining  motion  components.  This  algorithm  runs  in 
simulation  on  Sun  computers,  and  there  are  currently  plans  to  implement  this  set  of  algorithms  on  Samoff  s  custom 
VFE  hardware  or  on  a  new  custom  hardware  set. 


Multi-motion  Algorithm 

The  Samoff  algorithm  assumes  that  the  image  frame  contains  two  independent  patterns  {P{x,y)  and  Q(x,y)) 
which  have  independent  motions  of/>(x,y)  and^(x,y).  If  a  direct  extension  of  single  motion  estimation  is  attempted, 
it  becomes  necessary  to  first  estimate  the  derivatives  of  both  patterns.  The  problem  arises  in  that  in  order  to  estimate 
these  derivatives,  the  patterns  must  first  be  separated  or  segmented.  The  Samoff  approach  eliminates  the  need  to 
segment  the  image  prior  to  simultaneously  estimating  the  two  motion  components. 

Two-Component  Motion  Model 

The  Samoff  two-component  motion  model  assumes  that  within  a  region  R,  the  image  /  may  be  represented  by  a 
combined  function  of  P{x,y)  and  Q{x,y): 


lix,  y,  0)  =  P(x,  y)  0  Q(x,  y) 
and 

Hpc,y,t)=P^®Q^^ 


where  denotes  the  pattern  P  transformed  by  the  motion  rp. 


(C.l) 


The  symbol  ©  represents  an  operator  such  as  addition  or  multiplication  that  combines  the  two  patterns.  For  ex¬ 
ample,  in  the  case  of  the  boundary  between  two  motion  patterns,  the  region  may  be  represented  by  the  sum  of  two 
patterns  that  are  defined  over  the  entire  analysis  region,  but  which  have  zero  amplitude  over  complementary  portions 
of  the  region. 
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Estimating  Two  Motions 

The  key  observation  in  the  Samoff  approach  is  that  if  one  of  the  motion  components  (p  or  q)  and  the  combina¬ 
tion  rule  (0)  are  known,  then  that  motion  component  may  be  removed.  The  image  motion  may  then  be  solved  by  the 
traditional  one-motion  algorithm  without  determining  the  actual  patterns  {P  and0.  We  will  assume  that  the  combi¬ 
nation  rule  is  addition. 

Let  us  assume  that  the  motion  p  is  known,  and  that  we  must  determine  the  motion  q.  The  pattern  P,  moving  at 
velocity  p,  may  be  removed  by  shifting  each  frame  by  pAt  and  subtracting  it  from  the  following  frame.  The  resulting 
sequence  will  be  void  of  any  contribution  from  pattern  P  and  may  be  assumed  to  contain  only  patterns  moving  with 
velocity  q.  This  difference  image  sequence  may  be  defined  by 


Df  =  /(x,  y,  /  + 1)  -  7  P(a:,  y,  t) 

= eH**  j- 


(C.2) 

As  is  shown  in  equation  (C.2),  the  sequence  may  now  be  represented  by  the  new  pattern 

moving  with  a  single  motion  velocity  q.  We  may  therefore  solve  for  the  image  motion  using  the  traditional  single¬ 
motion  model.  Note  that  this  procedure  removes  one  of  the  patterns  from  the  image  sequence  without  explicitly  de¬ 
termining  what  that  pattern  is. 


In  the  real  world,  the  motions  p  and  q  are  not  normally  known.  In  fact,  estimates  of  these  motions  may  not  even 
exist.  However,  by  using  an  alternating  iterative  refinement  procedure,  both  motions  may  be  recovered.  This  is  true 
even  if  initial  estimates  of  p = 0  and  q = 0  are  used.  In  this  procedure,  estimates  alternate  between  p  and  q.  Therefore, 
p  is  obtained  on  even-numbered  cycles  while  q  is  obtained  on  odd  cycles.  These  iterative  cycles  are  repeated  until 
Ap=  P;j  -  P/2+1  =  ^/2  ”  fl/2+1  sufficiently  small. 


Although  this  algorithm  runs  in  simulation  on  a  Sun  computer  platform,  it  was  not  available  for  characteriza¬ 
tion. 
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